This report is intended to explain the process that I followed in the dengue competition. The goal is to predict the number of dengue cases each week (in two different locations) based on environmental variables describing changes in temperature, precipitation, vegetation, and more.
You can read more about this competition Here
After renaming and transforming some variables (into factor/numeric/datetime), we start splitting the dataset in the two different cities.
Let’s compare the temperature distributions in Iquitos
#### 2.2. TEMPERATURE IQ & SJ _____________________________________________ ####
# Get the temperature variables
temp_var<-names(data[grep("temp", names(data))])
temp_var_k<-c("temp_air_mean_r","temp_air_avg_r","temp_dewpoint_r","temp_max_r",
"temp_min_r")
# Change everything to celsius
data_iq[temp_var_k]<- kelvin.to.celsius(data_iq[temp_var_k], round = 2)
data_sj[temp_var_k]<- kelvin.to.celsius(data_sj[temp_var_k], round = 2)
rm(temp_var_k)
# Temperature variables train & validation
geom.density.function(data=data_iq, variables=c(temp_var[1:10], "source"),
fill="source")
Let’s compare the temperature distributions in San Juan
geom.density.function(data=data_sj, variables=c(temp_var[1:10], "source"),
fill="source")
Let’s plot together all temperature variables in Iquitos
# Plot the progression of the temperature variables
plotly.line.function(data_iq,variables= c(temp_var[1:10], "total_cases",
"week_start_date"), x="week_start_date")
Let’s plot together all temperature variables in San Juan (we have divided by 10 the dependent variable for visualization purposes only)
datatemp_sj <- data_sj
datatemp_sj$total_cases<-datatemp_sj$total_cases/10
plotly.line.function(datatemp_sj,variables= c(temp_var[1:10], "total_cases",
"week_start_date"), x="week_start_date")
rm(temp_var, datatemp_sj)
Let’s compare the humidity and precipitation distributions in Iquitos
# Get the humidity variables
humid_precip_var<-names(data[grep("humid|precip", names(data))])
# humidity variables train & validation
geom.density.function(data=data_iq, variables=c(humid_precip_var, "source"),
fill="source")
Let’s compare the humidity and precipitation distributions in San Juan
geom.density.function(data=data_sj, variables=c(humid_precip_var, "source"),
fill="source")
Let’s plot together all humidity and precipitation variables in Iquitos
# Plot the progression of the humid & precip variables
plotly.line.function(data_iq,variables= c(humid_precip_var, "total_cases",
"week_start_date"), x="week_start_date")
Let’s plot together all humidity and precipitation variables in San Juan
# Plot the progression of the humid & precip variables
plotly.line.function(data_sj,variables= c(humid_precip_var, "total_cases",
"week_start_date"), x="week_start_date")
rm(temp_humid_precip)
Let’s compare the vegetation distributions in Iquitos
# Get the vegetation variables
var_veg<-names(data[grep("ndvi", names(data))])
# humidity variables train & validation
geom.density.function(data=data_iq, variables=c(var_veg, "source"),
fill="source")
Let’s compare the vegetation distributions in San Juan
geom.density.function(data=data_sj, variables=c(var_veg, "source"),
fill="source")
Let’s plot together all vegetation variables in Iquitos (we have multiplied by 100 the vegetation variables for visualization purposes only)
# Plot the progression of the veg variables
dataveg_iq <- data_iq
dataveg_iq[var_veg]<-dataveg_iq[var_veg]*100
plotly.line.function(dataveg_iq,variables= c(var_veg, "total_cases",
"week_start_date"), x="week_start_date")
Let’s plot together all vegetation variables in San Juan (we have multiplied by 1000 the vegetation variables for visualization purposes only)
dataveg_sj <- data_sj
dataveg_sj[var_veg]<-dataveg_sj[var_veg]*1000
plotly.line.function(dataveg_sj,variables= c(var_veg, "total_cases",
"week_start_date"), x="week_start_date")
rm(var_veg,dataveg_iq, dataveg_sj)